# Guide Your Agent with Adaptive Multimodal Rewards

## Prerequisite
- You must download M3AE checkpoints following [this link](https://drive.google.com/drive/folders/1I4tD8wA4o0QiHSY-TsER9uq9rzbrokQf). We used `m3ae_base.pkl`.
- Then, extract the params from the M3AE checkpoints as follows:
    ```python
    # Set the base path as "./instructrl/models"
    import pickle
    
    with open('{checkpoint_path}/m3ae_base.pkl', 'rb') as f:
        model = pickle.load(f)
    
    params = model["state"].params
    
    with open('{target_path}/m3ae_base_params.pkl', 'wb') as g:
        # m3ae_params = pickle.load(g)
        pickle.dump(params, g)
    ```
- You have to change L1035-L1037 of `./instructrl/models/m3ae/model.py` to the directory where your M3AE params are saved.
- We conduct all experiments using Ubuntu 20.04.
- Your machine must contain following packages.
    - CUDA
    - CuDNN
    - Qt
    - libglfw


## Installation
```python
conda create -n procgen -y python=3.8
conda activate procgen

cd ./mrdt_procgen

pip install --no-cache-dir -r requirements.txt
pip install "jax[cuda11_cudnn82]" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html gpustat ujson mpi4py

cd ./procgen_highres
pip install -e . && cd ..

cd ./procgen_highres_AISC
pip install -e . && cd ..
```

## Script
```bash
# generate multimodal reward using CLIP
CUDA_VISIBLE_DEVICES=0 python label_reward.py --env_name {env_name} --env_type {env_type}--image_keys "ob" --data_path {data_path} --model_type clip

# Fine-tune CLIP
CUDA_VISIBLE_DEVICES=0 python3 -m action_finetune_module.finetune --use_vip_loss True --use_id_loss True --data.path {data_path} --default_root_dir {fine-tuned-CLIP_ckpt_save_path} --epochs 20 --model_type clip_multiscale_ensemble --game_name {env_name} --env_type {env_type} --data.image_key "ob" --lambda_id 1.5

# generate multimodal reward using fine-tuned CLIP
CUDA_VISIBLE_DEVICES=0 python label_reward.py --env_name {env_name} --env_type {env_type}--image_keys "ob" --data_path {data_path} --model_type clip_multiscale_vip_id --model_ckpt_dir {fine-tuned-CLIP_ckpt_path}

# Train return-conditioned agent using CLIP
CUDA_VISIBLE_DEVICES=2,3 sh ./jobs/train_procgen.sh {data folder path} {env_name} {data_env_type} {eval_env_type} {augmentation} False {use_vl} clip "" 4 0.0 1000.0 {seed} {comment} False True 0.01

# Train return-conditioned agent using fine-tuned CLIP
CUDA_VISIBLE_DEVICES=0,1 sh ./jobs/train_procgen.sh {data folder path} {env_name} {data_env_type} {eval_env_type} {augmentation} False {use_vl} clip_multiscale_vip_id {fine-tuned-CLIP_ckpt_path} 4 0.0 1000.0 {seed} {comment} False True 0.01
```
